Large vocabulary conversational speech recognition with the extended maximum likelihood linear transformation (EMLLT) model
نویسندگان
چکیده
This paper applies the recently proposed Extended Maximum Likelihood Linear Transformation (EMLLT) model in a Speaker Adaptive Training (SAT) context on the Switchboard database. Adaptation is carried out with maximum likelihood estimation of linear transforms for the means, precisions (inverse covariances) and the feature-space under the EMLLT model. This paper shows the first experimental evidence that significant word-error-rate improvements can be achieved with the EMLLT model (in both VTL and VTL+SAT training contexts) over a state-of-the-art diagonal covariance model in a difficult large-vocabulary conversational speech recognition task. The improvements were of the order of 1% absolute in multiple scenarios.
منابع مشابه
Large vocabulary conversational speech recognition with a subspace constraint on inverse covariance matrices
This paper applies the recently proposed SPAM models for acoustic modeling in a Speaker Adaptive Training (SAT) context on large vocabulary conversational speech databases, including the Switchboard database. SPAM models are Gaussian mixture models in which a subspace constraint is placed on the precision and mean matrices (although this paper focuses on the case of unconstrained means). They i...
متن کاملLinear Transforms in Automatic Speech Recognition: Estimation Procedures and Integration of Diverse Acoustic Data
Linear transforms have been used extensively for both training and adaptation of Hidden Markov Model (HMM) based automatic speech recognition (ASR) systems. Two important applications of linear transforms in acoustic modeling are the decorrelation of the feature vector and the constrained adaptation of the acoustic models to the speaker, the channel, and the task. Our focus in the first part of...
متن کاملGeneralized discriminative feature transformation for speech recognition
We propose a new algorithm called Generalized Discriminative Feature Transformation (GDFT) for acoustic models in speech recognition. GDFT is based on Lagrange relaxation on a transformed optimization problem. We show that the existing discriminative feature transformation methods like feature space MMI/MPE (fMMI/MPE), region dependent linear transformation (RDLT), and a non-discriminative feat...
متن کاملThe AT&t large vocabulary conversational speech recognition system
We describe the AT&T recognition system used in the DARPA Large Vocabulary Conversational Speech Recognition (LVCSR98) evaluation. It is based on multi-pass rescoring of weighted Finite State Machines (FSMs) using progressively more accurate acoustic models. Acoustic models used in the system are all gender independent. They are based on three state contextdependent hidden Markov models using G...
متن کاملMaximum Likelihood Lineartransformations for Hmm
This paper examines the application of linear transformations for speaker and environmental adaptation in an HMM-based speech recognition system. In particular, transformations that are trained in a maximum likelihood sense on adaptation data are investigated. Other than in the form of a simple bias, strict linear feature-space transformations are inappropriate in this case. Hence, only model-b...
متن کامل